Synopsis: Random Selection

Let's learn another antipattern by using an example of displaying advertisements on a website.

You’re writing a web application that displays advertisements. You’re supposed to choose a random ad on each viewing so that all your advertisers have an even chance of showing their ads and so that viewers don’t get bored seeing the same ad repeatedly.

Randomness

Things go well for the first few days, but the application gradually becomes more sluggish. A few weeks later, people are complaining that your website is too slow. You discover it’s not just psychological; you can measure a real difference in the page load time. Your readers are starting to lose interest, and traffic is declining.

Learning from past experiences, you first try to find the performance bottleneck using profiling tools and a test version of your database with a sample of the data. You measure the time it takes to load a web page, but, curiously, there are no problems with the performance in any of the SQL queries used to produce the page. Yet, the production website is getting slower and slower.

Finally, you realize that the database on your production website is much greater than the sample in your tests. You repeat your tests with a database of a similar size to the production data and find that it’s the ad-selection query that is causing the performance to drop. You find that with a greater number of ads to choose from, the performance of that query drops sharply. You’ve discovered the query that fails to scale, and that’s an essential first step.

How can you redesign the query that chooses random ads before your website loses its audience and, therefore, your sponsors?

Objective: Fetch a sample row#

It’s surprising how frequently we need an SQL query that returns a random result. This seems to go against the principles of repeatability and deterministic programming. However, it’s ordinary to ask for a sample from a large data set. The following are some examples:

  • Displaying rotating content, such as an advertisement or a news story, to highlight

  • Auditing a subset of records

  • Assigning incoming calls to available operators

  • Generating test data

It’s better to query the database for this sample as an alternative to fetching the entire data set into our application so that we can pick a sample from the set.

The objective is to write an efficient SQL query that returns only a random sample of data.

Legitimate uses of the antipattern#

We will study a solution “sort-by-random” in the next lesson. The inefficiency of the sort-by-random solution is tolerable if our data set is bound to be small. For example, we could use a random method for assigning a programmer to fix a given bug. It’s safe to assume that we’ll never have so many programmers that we need to use a highly scalable method for choosing a random person.

Another example could be selecting a random U.S. state from a list of the 50 states, which is a list of modest size and not likely to grow during our lifetimes.

Solution: Use Columns Unambiguously
Antipattern: Sort Data Randomly
Mark as Completed
Report an Issue